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Abstract 

This paper explains why scripted dialogue shares some 
crucial properties with discourse. In particular, when 
scripted dialogues are generated by a Natural Language 
Generation system, the generator can apply revision 
strategies that cannot normally be used when the dia- 
logue results from an interaction between autonomous 
agents (i.e., when the dialogue is not scripted). The pa- 
per explains that the relevant revision operators are best 
applied at the level of a dialogue plan and discusses how 
the generator may decide when to apply a given revision 
operator. 

Controlling Global Properties in Text 
Generation 

A Natural Language Generation (nlg) system puts infor- 
mation into words. When doing this, the system makes a 
large number of decisions as to the specific way in which 
this is done: aggregating information into paragraphs and 
sentences, choosing one syntactic pattern instead of another, 
deciding to use one word rather than another, and so on. For 
many purposes, such decisions can be made on the basis 
of local information. The choice between an active and a 
passive sentence, for example, usually does not take other 
decisions (such as the analogous choice involving another 
sentence) into account. In slightly more difficult cases, the 
decisions of the generator can be based on decisions that 
have been taken earlier. For example, the generator may in- 
spect the linguistic context to the left of the generated item 
for deciding between using a proper name, a definite de- 
scription, or a personal pronoun. There are, however, sit- 
uations in which generative decisions require information 
about text spans that have not yet been generated. Such 
situations typically arise when the generated text is subject 
to global constraints, i.e., constraints on the text as whole, 
such as its length. For example, suppose there is length 
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constraint. Now, in order to decide whether it is necessary 
to use aggregation when generating some subspan (to stay 
beneath the maximum length), it is necessary to know the 
length of the text outside of the current subspan. This can 
lead to complications since at the point in time when the 
cuiTent span is generated, the rest of the text might not yet 
be available.' One class of global NLG constraints involves 
the linguistic style of a text, as regards its degree of for- 
mality, detail, partiality, and so on (Hovy 1988). Whether 
a text can be regarded as moderately formal, for instance, 
depends on whether formal words and patterns are chosen 
moderately often, which makes this a quintessentially global 
constraint. Moreover, style constraints may contradict each 
other For example, if a text has to be impartial as well as 
informal then the first of these constraints may be accom- 
modated by choosing a passive construction, but the second 
would be tend to be adversely affected by that same choice. 
Hovy argues that these problems make top-down planning 
difficult because it is hard to foresee what 'fortuitous oppor- 
tunities' will arise during later stages of NLG. Perhaps more 
contentiously, he argues that these problems necessitate a 
monitoring approach, which keeps constant track of the de- 
gree to which a given text span satisfies each of a number of 
constraints (e.g., low formality, low partiality). After gener- 
ating the first n sentences of the text, the remainder of the 
text is generated in a way that takes the degrees of satisfac- 
tion for all the style constraints into account, for example 
by favouring those constraints that have been least-recently 
satisfied (or least-often satisfied of the text span in its total- 
ity). Monitoring is an attempt to address global constraints 
in an incremental fashion and may be viewed as a plausible 
model of spontaneous speech. It may be likened to steering a 
ship: when the ship is going off course, you adjust its direc- 
tion. Needless to say, there is no guarantee that monitoring 
will result in a happy outcome, but it is a computationally 



'Compare the constraint, in daily life, that a party has to^f into 
one's living room. This constraint can lead to interdependencies 
between decisions. For instance, suppose the room can contain ten 
visitors, and eleven invitations have already gone out. Is it wise 
to send out another invitation? This depends on how many of the 
invitations will be accepted, which can be difficult to foresee. An- 
other, more qualitative contraint may be that the party has to be 
pleasant, and this may introduce further interdependencies (e.g., if 
Mr. X comes to the party then Mrs. y had better not be invited). 



affordable approach to a difficult problem. 

As we have seen, another example of a global constraint 
is the constraint to generate a text of a specified size (e.g., 
in terms of the number of words or characters used). Reiter 
(2000) discusses various ways in which the size of a gen- 
erated text may be controlled. We already saw that for cer- 
tain generation decisions it might be necessary to know the 
length of the remainder of the text. Based on experiments 
with the STOP system, Reiter observes that the size of a text 
is difficult to estimate (i.e., predict) on the basis of an ab- 
stract document plan. He argues that revision of texts, based 
on measuring the size of an actually generated text, is the 
way to go: when the size measure indicates that the text is 
too large, the least important message in the sentence plan 
was deleted and the text regenerated. 

Revision has been claimed to be a good model of human 
writing (e.g., Hayes and Flower 1986). In some respects, 
size is a particularly difficult kind of global constraint, since 
it applies to the form rather than the content of a text, and 
this is why Reiter and colleagues let revision wait until a 
draft of the text has been generated and evaluated (i.e., after 
its size has been measured). In other respects, it is relatively 
simple, because it is one-dimensional and straightforward to 
define. 

In this paper, we discuss a class of non-local decisions 
in the generation of dialogue that were first introduced in 
Piwek and van Deemter (2002). To be able to make these 
decisions in a well-motivated way, without complicating the 
NLG system unduly, we adopt a revision approach, which 
makes our system more similar to Reiter's STOP system than 
to Hovy's PAULINE. Like STOP, we use a revision strategy, 
but unlike STOP, we do not need to evaluate the textual out- 
put of the generator since, in our case, evaluation can be 
done on the basis of an abstract representation of the dia- 
logue content. 

Non-local decision making in Dialogue 
Generation 

Let us define more precisely what we mean by non-local 
decisions in generation: 

(NON-LOCAL DECISIONS IN GENERATION) Given a 

generator G that is producing some subspan Sa of a 
document D, we will call a decision by G concerning 
the generation of non-local if it requires information 
regarding the content and/or realization of some other 
span Sb in D, where some or all of Sb is not included in 

Sa- 

Given a left-to-right generation algorithm, there is no prob- 
lem if Sb precedes Sa'. content and form of Sb can be stored 
so that they are available when Sa is generated. However, 
if all or part of Sb succeeds Sa, (i.e., has not yet been gen- 
erated), we have a problem. In the preceding section, three 
solutions for this problem have been exemplified: one was 
based on estimating the relevant properties of ,st, one on re- 
vising Sa, and one on constantly monitoring the satisfaction 
of the global properties that need to be satisfied. 

Note that revision is a strategy which has to be treated 
with care. For instance, we need to make sure that never- 



ending cycles of revision do not occur Such a cycle would, 
for instance, occur if a particular revision operation always 
created circumstances which allowed it to be applied again. 
We return to this issue later on in this paper. 

In the remainder of this paper we discuss what options are 
available if the text that needs to be generated is a dialogue. 

Investigations into dialogue by computational Unguists 
have typically focussed on the communication between the 
interlocutors that take part in the dialogue. The dialogue is 
taken to consist of communicative interaction between two 
individuals. There are, however, many situations where a 
dialogue fulfils a different purpose: dialogues in the theatre, 
on television (in drama, sitcoms, commercials, etc.) and on 
the radio (in radio plays, commercials, etc.) are not primar- 
ily meant as vehicles for genuine communication between 
the interlocutors, but rather at aiming to affect the audience 
of the dialogue (in the theatre, in front of the TV or radio). 
The effects are often achieved through global properties of 
the dialogue (the dialogue should take only a certain amount 
of time, should be entertaining, should teach the audience 
something, should make a certain point forcefully, etc.). In 
short, if we look at the dialogue from the perspective of an 
audience, global properties of the dialogue are of great im- 
portance. 

The Information State approach 

Firstly, let us consider the currently prevalent approach to 
generating dialogue. The starting point is the use of two 
autonomous agents, say A and B. These agents are asso- 
ciated with Information States (e.g., Traum et al., 1999): 
IS{A) and IS{B). The agent who holds the turn gener- 
ates an utterance based on its Information State. This leads 
to an update of the Information States of both agents. Sub- 
sequently, whichever agent holds the turn in the new Infor- 
mation States, produces the next utterance. In most imple- 
mented systems, one of the agents is a dialogue system and 
the other a human user; the approach has, however, also been 
used for dialogue simulations involving two or more soft- 
ware agents (as pioneered by Power, 1979). 

For our purposes, it is important to note that the agents 
only have access to their own Information State, and can 
only use this state to produce contributions. This has some 
repercussions when it comes to controlling global properties 
of dialogue. Estimation becomes more difficult if it involves 
a span that is to be generated by the other agent. The agent 
has no access to the Information State of this agent and will 
therefore find it more difficult to estimate what it is going 
to say. Furthermore, estimating one's own future utterances 
can become more difficult, since they may succeed and de- 
pend on utterances by the other agents. In general estimation 
for the purpose of coordinating the generation of the current 
span with not yet generated spans is more difficult/less reli- 
able in the Information State approach. 

Revision is also much more limited if the Information 
State approach is strictly followed. Revision is only pos- 
sible within a turn. The Information State approach assumes 
that turns are produced according to their chronological or- 
dering, and hence it is not possible to go back to a turn once 
it has been completed. 



The Dialogue Scripting approach 

An alternative to the Information State approach is the dia- 
logue scripting approach. In Piwek & Van Deemter (2002) 
we take the main characteristic of this approach to be that 
it involves the creation of a dialogue (script) by one single 
agent, i.e., the script author Thus, the production of the di- 
alogue is seen as analogous to single-author text generation. 

The automated generation of scripted dialogue has been 
poineered by Andre et al. (2000). We follow Andre et al. 
(2000) in distinguishing between the creation of the dialogue 
text, i.e., the script and the performance of this script. Of 
course the Information State approach also lends itself for 
such a separation, but typically the authoring and perfor- 
mance functions are taken care of by the same agent (the 
interlocutors) and take place at the same time. In the script- 
ing approach, the script for the entire dialogue is produced 
first. The performance of this script takes place at a later 
time, typically by actors who are different from the author. 

There are at least two reasons why the scripting approach 
is better suited to creating dialogues with certain global 
properties than Information State approaches. Firstly, in the 
scripted dialogue approach the information and control re- 
sides with one author. This makes estimation more reliable; 
assuming that it is easier to predict one's own actions than 
those of another agent. Secondly, the scripting approach 
does not presuppose that the dialogue is created in the same 
temporal order in which it is to be performed. Hence it is 
possible to revisit spans of the dialogue and edit them. 

In between traditional Information State and Scripted Di- 
alogue approaches, hybrid approaches are possible. For in- 
stance, one might generate a dialogue according to the In- 
formation State approach, and then edit this draft with a sin- 
gle author. The techniques described in the next section are 
presented in the context of pure Dialogue Scripting but they 
remain largely valid for more hybrid set-ups. 

Despite the existence of hybrid approaches it is impor- 
tant to keep in mind the different perspectives from which 
Information State and Scripted Dialogue approaches arose: 
the Information State approach focuses on the communica- 
tion between the interlocutors in the dialogue, whereas the 
scripted dialogue approach focuses on the communication 
between the script author and the readers/audience of the di- 
alogue; the communication between the interlocutors of the 
scripted dialogue is only pretended communication. 

Exploring the Control of Global Dialogue 
Properties in NECA 

In this section we discuss the control of global dialogue 
properties in the NECA system.^ NECA generates Dialogue 
Scripts that can be performed by animated characters. Cur- 
rently, a prototype exists -called eShowroom- for the gen- 

^NECA is an EU-IST funded project which started in October 
2001 and has a duration of 2.5 years. NECA stands for 'Net Envi- 
roment for Embodied Emotional Conversational Agents'. The fol- 
lowing partners are involved in the project: Dfki, Ipus (University 
of the Saarland), ITRI (University of Brighton), OFAI (University 
of Vienna), Freeserve and Sysis AG. Further details can be found 
at: , http://www.ai.univie.ac.at/NECA/. 



eration of car sales dialogues; a prototype for a second do- 
main. Socialite, involving social chatting is under construc- 
tion. Here we focus on eShowroom, which will be featured 
on an internet portal for car sales. 

The NECA eShowroom system 

The eShowroom demonstrator allows a user to browse a 
database of cars, select a car, select two characters and their 
attributes, and subsequently view an automatically gener- 
ated film of a dialogue between the characters about the 
selected car The eShowroom system is provided with the 
following information as its input: 

• A database with facts about the selected car (maximum 
speed, horse power, fuel consumption, etc.). 

• A database which correlates facts with value dimensions 
such as 'sportiness', 'environmental-friendliness', etc. 
(e.g., a high maximum speed is good for 'sportiness', high 
gasoline consumption is bad for the environment). 

• Information about the characters: 

- Personality traits such as extroversion and agreeable- 
ness. 

- Personal preferences concerning cars (e.g., a prefer- 
ence for cars that are friendly for the environment). 

- Role of the character (either seller or customer). 

This input is processed in a pipeline that consists of the fol- 
lowing modules: 

1 . A Dialogue Planner, which produces an abstract descrip- 
tion of the dialogue (the dialogue plan). 

2. A multi-modal Natural Language Generator which spec- 
ifies linguistic and non-linguistic realizations for the dia- 
logue acts in the dialogue plan. 

3. A Speech Synthesis Module, which adds information for 
Speech. 

4. A Gesture Assignment Module, which controls the tem- 
poral coordination of gestures and speech. 

5. A player, which plays the animated characters and the cor- 
responding speech sound files. 

Each step in the pipeline adds more concrete information 
to the dialogue plan/script until finally a player can render 
it (see also Krenn et al., 2002). A single XML compliant 
representation language, called RRL, has been developed for 
representing the Dialogue Script at its various stages of com- 
pletion (Piwek et al., 2002). 

The following is a transcript of a dialogue fragment which 
the system currently generates (Note that this is only the 
text. The system actually produces spoken dialogue accom- 
panied by gestures of the embodied agents which perform 
the script): 

Seller: Hello! How can I help? 

Buyer: Can you tell me something about this car? 

Seller: It is very comfortable. 

Seller: It has leather seats. 

Buyer: How much does it consume? 

Seller: It consumes 8 liters per 60 miles. 



Buyer: I see. 
Etc. 

Here, we focus on the representation of this dialogue after it 
has been processed by the Dialogue Planning module. The 
RRL dialogue script consists of four parts: 

1 . A representation of the initial common ground of the 
interlocutors. This representation provides information 
for the generation of referring expressions. 

2. A representation of each of the participants of the 
dialogue. For instance, we have the following repre- 
sentation for the seller named Ritchie: 

<person id="ritchie"> 

<realnaine f irstnaine="Ritchie" 

title="Mr"/> 
<gender type="inale"/> 
<appearance character= 

"http : //neca . sysis . at/eroom/ 
msagent/ ritchie_hq/ 
Ritchie_Of f . acf "/> 
<voice name="us2"> 

<prosodY pitch="-20%" 
rate="-10%"/> 
</voice> 
<personality 

agreeableness=" 0.8" 
ext ravers ion=" 0.8" 
neuroticism=" 0.2" 
politeness= "polite" / > 
<domainSpecif icAttr 
role=" seller " 
x-position=" 7 0" 
y-position="200"/> 
</ person> 

3. A representation of the dialogue acts. Each act 
is associated with attributes, some of which are op- 
tional, specifying its type, speaker, addressees, seman- 
tic content (in terms of a discourse representation struc- 
tures, Kamp & Reyle, 1993), what it is a reaction to (in 
terms of conversation analytical adjacency pairs) and 
the emotions with which it is to be expressed. The fol- 
lowing is the representation for the dialogue act corre- 
sponding with the sentence 'It has leather seats': 

<dialogueAct id="v_4"> 
<domainSpecif icAttr 

type=" inform" / > 
<speaker id="ritchie"/> 
<addressee id="tina"/> 
<seinanticContent id="d_4"> 
<drs id="d_3"> 
<ternaryCond 
argOne=" x_l " 
argThree="true" 
argTwo=" leather_seats " 
id="c_4" 

pred="attribute" /> 

</drs> 



</ semanticContent> 
<reactionTo id="v_3"/> 
</dialogueAct> 

4. The fourth component of the RRL representation of 
the dialogue script consist of the temporal ordering of 

the dialogue acts: 

<temporalOrdering> 
<sequence> 

<act id="v_l"/> 
<act id="v_2"/> 
<act id="v_3"/> 
<act id="v_4"/> 



</sequence> 
</ temporalOrdering> 

Enforcing global constraints in dialogue 

Here we want to examine how to adapt the system so that it 
can take global constraints into account. We have seen in the 
preceding section that the Dialogue Scripting approach is 
best suited for the control of global dialogue properties. The 
NECA system is based on Dialogue Scripting: each mod- 
ule in the pipeline operates as a single author/editor which 
creates/elaborates the Dialogue Script. Within the Dialogue 
Scripting approach various methods for controlling global 
properties can be employed. Earlier on, we discussed moni- 
toring, estimation and revision as approaches that have been 
employed in text generation. For the purpose of this paper, 
we limit our attention to the revision approach. 

There are a number of reasons for this choice. Firstly, 
monitoring is tailored to left-to-right processing, whereas 
the dialogue scripting approach is not constrained in this 
way. Moreover, monitoring can only work well if the num- 
ber of decisions relating to each constraint is very large, 
since this gives the system many opportunities for 'chang- 
ing course' . But even if a single-pass monitoring approach 
could work well at the level of dialogue, it would tend to 
compUcate the design and maintainance of the system (cf. 
Callaway & Lester 1997, Reiter 2000, Robin & McKeown 
1996).-' Having a separate revision module also allows for 
a more straightforward division of labour between multiple 
system developers. (In neca, for example, the dialogue 
planner and the revision system are developed at different 
sites.) 

Furthermore, Reiter (2000) argues that revision compares 
favourably with other techniques for satisfying size con- 
straints (a specific type of global constraint) in text gener- 
ation. He compares revision with heuristic size estimators 
(for predicting the size of the text on the basis of the message 
to be conveyed) and multiple solution pipelines. Robin and 
McKeown (1996) discuss the implementation of revision 
in a summarization system for quantitative data (streak). 

'We could, for instance, have included the aggregation and in- 
sertion operations (see below) directly in our dialogue manager, but 
this would have complicated the dialogue planner rules. 



They carried out two evaluations which they argue show that 
their revision-based approach covers a significantly larger 
amount of the structures found in their corpus than a tra- 
ditional single-pass generation model. Additionally, they 
claim that their evaluation shows that it is easier to extend 
the revision-based approach to new data. 

Our approach to revision differs from other approaches. 
Firstly, our revisions are carried out on the abstract dialogue 
plan, before linguistic realization. Although Callaway &. 
Lester also carry out their revision operations on abstract 
representations of sentences, these are obtained by first gen- 
erating concrete sentences and then abstracting again over ir- 
relevant details. Instead of first fully generating and then ab- 
stracting, we follow an approach of partial generation. Sec- 
ondly, Reiter and Callaway & Lester focus on a single type 
of constraint. In this respect, our work is more similar to 
that of Hovy, where different types of potentially conflict- 
ing constraints are considered. To our knowledge, we are 
the first to propose revision operations on dialogue structure 
as opposed to discourse or sentence structure. Ultimately, 
of course, these different types of revision ought to be ad- 
dressed through one connmon approach. 

Two constraints and a revision problem 

To illustrate the issues, let us consider two global constraints 
on dialogue: 

• Number of turns in the dialog (Turns): maximal (max) 
or minimal (MIN) 

• Degree of Emphasis (Emph): maximal (max) or min 
(MIN) 

For the moment, we keep the constraints as simple as possi- 
ble and assume that they can only take extreme values (max 
or MIN). 

Furthermore, we introduce two revision operations on the 
output of the dialogue planner: 

• Adjacency Pair Aggregation (Aggr) 

Operation: Given the adjacency pairs A = (^1,^2) 
and B = (Bi,B2) in the input, create A-hB = 

iAi+BuA2+B2). 

Precondition: A and B are about the same value di- 
mension. 

Example: A = (Does it have airbags? Yes), 

B = (Does it have ABS? Yes), 

Ah-B = (Does it have airbags and ABS? Yes) 

Comment: The shared value dimension is security. 

• Adjacency Pair Insertion (Insert) 

Operation: Given adjacency pair A = (^1,^2) in the 
input, 1. create adjacency pair B = (Bi,B2) which 
is a clarificatory subdialogue about the information 
exchanged in A and 2. insert B after A, resulting in 
(AB) = (AiMXBM. 

Precondition: The information exchanged in A is 
marked for emphasis. 



Example: A = (Does it have leather seats? Yes). As- 
sume that comfort is positively correlated with hav- 
ing leather seats and that the user has indicated that 
the customer prefers comfortable cars. On the basis 
of this, the information exchanged in A is marked for 
emphasis. The text after revision is: (AB) = Does it 
have leather seats? Yes. Real leather? Yes, genuine 
leather seats. 

Comment: Piwek & Van Deemter (2002) contains 
examples of how human authors of scripted dialogue 
appear to use sub-dialogues for emphasis. 

In the definitions of the two operations we use the notion of 
an adjacency pair which is common in Conversation Anal- 
ysis. The idea is that the first and second part of the pair are 
connected by the relation of conditional relevance (e.g., a 
pair consisting of a question and an answer): 'When one ut- 
terance (A) is conditionally relevant to another (S), then the 
occurrence of S provides for the relevance of the occurrence 
of A (Schegloff, 1972:76). 

Let us now describe our revision problem. We have an 
initial dialogue plan dpi, produced by the dialogue planner. 
Before it is passed on to the multi-modal natural language 
generator we want to apply the revision operations Aggr 
and Insert in such a way that the resulting dialogue plan 
dp2 optimally satisfies the constraints for turn and emph. 
In total, there are four possible constraint settings: 

1. turn = MAX and emph = max. 

2. TURN = MAX and EMPH = MIN. 

3. TURN = MIN and emph = min. 

4. TURN = MIN and emph = max. 

Sequential revision 

Let us look at two alternative ways in which these constraint 
settings might be satisfied. The first is simple-minded but 
efficient: First, one operation is applied as often as needed 
and then the same is done for the other operation. We as- 
sume that insertion correlates with EMPH and aggregation 
correlates with TURN. If a constraint is set to MAX, the op- 
eration is performed as often as possible; if the constraint 
is set to MIN, the operation is not appUed at aU. Note that 
this procedure will always terminate, given that our initial 
dialogue plan contains only a finite number of pieces of in- 
formation that are marked for emphasis and there is only a 
finite number of adjacency pairs which share a value dimen- 
sion. Hence the preconditions of the operations can only be 
satisfied a finite number of times. 

There are, however, complications, since constraints may 
be interdependent. One type of problem obtains when one 
operation affects (i.e. creates or destroys) the preconditions 
for another. Suppose, for example, our setting is (turn = 
MAX, EMPH = max), while aggregation is performed be- 
fore insertion. In this case, a less than maximal number of 
aggregations would result, since insertion can introduce new 
candidates for aggregation. This type of problem can usually 
be finessed by finding an 'optimal' ordering between opera- 
tions: if insertion preceeds aggregation, both constraints of 



our example situation can be satisfied. Unfortunately, how- 
ever, there is another type of problem which cannot be fi- 
nessed so easily. Suppose, for example, our setting is (turn 
= MIN, EMPH = max), and observe that insertion positively 
affects the number of turns as well as the degree of emphasis, 
affecting the two constraints (TURN = MIN, EMPH = MAX) 
in opposite ways. In such a case it is unclear what the best 
strategy is: the algorithm might either maximize the num- 
ber of insertions, trying to maximize emphasis, or minimize 
them, trying to minimize the number of turns. To tackle 
problems of both kinds, an approach is needed that is able to 
make trade-offs between conflicting constraints. 

A 'Generate and Test' approach 

To tackle both these problems, we propose a 'generate and 
test' approach to the revision problem. The algorithm pro- 
ceeds as foUows: 

1 . We use a conventional topdown planner to produce a sin- 
gle dialogue plan dpsiari- 

2. Next, we generate all possible plans that can be obtained 
by applying the operations INSERT and Aggr zero or 
more times, in any order, to dp start- Let us call this set of 
all possible output plans DPgut- 

3. Each member of DPoui is assigned a score for the TURN 
and EMPH constraints. Each dialogue plan dp € DPout is 
characterized as a tuple consisting of the turn and emph 
scores {St,Se)- The TURN score St depends on the 
number of turns of the dialogue plan. The EMPH score 
Se depends on the number of emphasis subdialogues in 
the dialogue plan that were added during revision. We 
assume that our scores are normalized, so they each oc- 
cupy a value on the interval [0-100], i.e., satisfaction of 
the constraint from 0% to 100%. 100% means that there 
is no alternative dp which does better. 

4. Finally, on the basis of the scores assigned to the plans in 
DPout and according to some arbitration plan, we select 
an optimal outcome or set of optimal outcomes, i.e., a 
unique solution or a set of solutions. 

At this point, one might ask how we decide which opera- 
tions are included in the conventional topdown planner and 
which ones are deemed revision operations. To answer this 
question, it will be useful to elaborate a bit on our underlying 
assumptions about dialogue. 

In classical Discourse Theory (e.g.,. Stenstrom, 1994) 
conversations typically consist of three distinguishable 
phases: an opening, a body and a closing, each of which has 
a specific purpose. According to for, instance, Clark (1996) 
individual dialogue acts belong to different tracks depend- 
ing on how directly they contribute to the purpose of the 
dialogue phase in which they occur. On track 1 we have di- 
alogue acts which are intended to immediately contribute to 
furthering this purpose, for instance, the buyer's asking for 
the price of the car to the seller or the buyer's introducing 
him- or herself to the seller."' Metacoimnunication about the 

"'in our tentative view, greetings occur at the level of track 1, 
since they directly further the purpose of the opening and are, there- 



communication on track 1 takes place at the level track 2. 
This includes monitoring the success of the communication, 
attempting to fix communication problems, etc. 

For our purposes, the distinction between acts on track 1 
and 2 is a useful one, since acts on track 2 can be viewed 
as mere decorations of the acts which further the purpose of 
the conversation on track 1. For example, if we omit the ut- 
terances on track 2 the remaining dialogue script still makes 
sense (cf. Piwek & Van Deemter, 2002), whereas remov- 
ing utterances from track 1 does not have the same effect. 
Consider, for instance, the foUowing exchange: 

1. Buyer: How much does the car cost? 

2. Seller: 15.000 Euro. 

3. Buyer: 15.000? 

4. Seller: Yes, only 15.000. 

The acts on track 1 (1. and 2.) make sense on their own, 
whereas those on track 2 (3. and 4.) do not. For this reason, 
acts on track 1 are dealt with by the dialogue planner, while 
acts on track 2 are inserted at the revision stage, by means 
of the operation Insert. 

The operation Aggr is an instance of an aggregation op- 
eration on the dialogue level. Aggregation operations are 
typically dealt with as involving revision: two or more struc- 
tures are merged/revised into one new structure. Our Aggr 
allows us to reorganize the location of dialogue acts. It does 
not add or remove any dialogue acts. The precondition on 
AGGR, which stipulates that only dialogue acts which deal 
with the same value dimension can be aggregated, guards 
against erratic reorganizations of the dialogue, destroying 
smooth shifts from one topic (value dimension) to another.^ 

A second issue which the current sketch for an algorithm 
raises is that of the choice of an 'arbitration plan' for select- 
ing the solution or set of solutions from DPout ■ Fortunately, 
this is a well-known problem in decision theory and more 
specifically game theory. One would like an arbitration plan 
to satisfy certain criteria which define a fair balancing of 
different constraints. One set of such criteria is due to John 
Nash, who proposed that a solution should satisfy the fol- 
lowing four axioms (Nash, 1950): 

1. Linear Invariance: If one transforms the scores for 
either constraint by a positive linear function, then the so- 
lution should be subject to the same transformation. This 
axiom derives from the fact that the score/utility functions 
in game theory are normally taken to be an interval scale. 
These are invariant only under positive linear transforma- 
tions. 

2. Symmetry: If for each outcome associated with a pair 
of scores {x, y), there is another outcome (y, x), then the 
solution should consist of a pair of identical scores {z,z). 
In words, constraints are treated as equals with respect to 
each other. 



fore, distinct from the metacommunication which takes place on 
track 2. Alternative views would be equally easy to model. 

'More generally, the generation of texts with smooth topic 
shifts can be seen as a constraint satisfaction problem. See, for 
instance. Kibble & Power (2000). 



3. Independence OF Irrelevant Alternatives: Sup- 
pose we have two different outcome sets A and B. As- 
sume also that ^ c .B. If the solution for B is a member 
of A, then this solution should also be a solution for A. In 
words, the unavailability of non-solution outcomes should 
not influence the final solution. 

4. Pareto Optimatily: The solution should be Pareto 
Optimal. A pair {St,Se) G DPout is Pareto Optimal 
iff it is impossible to find another pair {S^, S'^) G DPout 
such that: 

(a) S'j, = St and S'^ > Se or 

(b) S'e = Se and S'j. > St or 

(c) S'e > Se and 5^ > St- 

For our purpose, the notion of Pareto Optimality is particu- 
larly interesting. A pair is Pareto Optimal if and only if it is 
impossible to improve one of its elements without making 
the other element worse off. Unfortunately, Pareto Optimal- 
ity does not help us to identify a unique solution: For exam- 
ple, if DPout contains only two pairs: dpi = (100, 10) and 
dp2 — (50, 50} then both pairs are Pareto Optimal. 

Nash came up with an arbitration plan which satisfies not 
only Pareto OptimaUty, but aU four of the proposed axioms. 
The idea is that by satisfying the four axioms, Nash's arbi- 
tration plan provides a 'fair' solution to the problem of max- 
imizing the degree to which both (in general: all) contraints 
are satisfied, that is, a solution that treats the two constraints 
evenhandedly. According to the Nash arbitration plan, the 
optimal solution is the solution {St, Se) with the highest 
value for St x Se- This plan causes dp2 to win, which is a 
desirable outcome, in our opinion. 

The Nash plan is guaranteed to choose a solution that is 
Pareto Optimal. The same is true for a plan that maximizes 
the sum instead of the product of the scores, but this would 
fail to punish a treatment that favoured one constraint over 
the other, as in the case of the dpi = (100, 10) and dp2 = 
(50,50). 

Conclusions 

We have discussed various existing methods for controlling 
global properties of generated text (such as length and style). 
Having done this, we have focused on generated dialogue, 
offering a number of arguments in favour of the Scripted Di- 
alogue approach. Building on observations in the hterature, 
we went on to make a case for revision approaches to con- 
trolling global properties of scripted dialogue. More specifi- 
cally, we have sketched the potential for a revision approach 
to dialogue generation in the neca system. In our discus- 
sion of NECA, we have outlined a new 'generate and test' 
approach that would put well-known techniques from deci- 
sion theory and game theory to a novel use, thereby exempli- 
fying a recent trend in formal and computational linguistics 
(Rubinstein 2000). 
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